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ABSTRACT 


The  following  model  is  considered  by  Starr  (1972):  At  most 

n tosses  of  a coin,  having  a constant  probability  p of 
coming  up  heads,  are  made.  After  each  toss  we  have  the 
option  of  either  stopping  and  receiving  an  amount  equal  to 
the  length  of  the  terminal  run  of  heads  (that  is,  if  we  were 
on  a streak  of  k heads  in  the  last  k tosses,  then  we 
could  stop  and  receive  k)  , or  of  paying  an  amount  c and 
tossing  the  coin  again.  When  n tosses  have  already  been 
made,  we  must  stop. 


The  purpose  of  this  note  is  to  point  out  that  with  a simple 
modification  the  above  problem  fits  the  framework  in  which  a 
one-stage  look  ahead  policy  is  optimal.  This  yields  not 
only  an  easy  solution  to  the  problem  but  also  provides  much 
insight.  For  instance,  the  reason  for  the  additivity  of  the 
optimal  continuation  boundary,  which  is  commented  on  by 
Starr  (1972)  on  Page  1890,  now  becomes  clear.  Also  the 
problem  may  be  generalized  sc  tlidt  the  terminal  payoff  is  a 
more  general  function  of  the  terminal  run  of  heads,  which 
may  even  also  depeiiu  on  the  number  of  tosses  made. 
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1.  INTRODUCTION 

The  following  model  is  considered  by  Starr  ( 1 9 V -d ) : At  most  n tosses  of  a 

coin,  having  a constant  probability  p of  coming  up  heads,  are  made.  After  each 
toss  we  have  the  option  of  either  stopping  and  receiving  an  amount  equal  to  the 
length  of  the  terminal  run  of  heads  (that  is,  if  we  were  on  a streak  of  k heads 
in  the  last  k tosses,  then  we  could  stop  and  receive  k)  , or  of  paying  an  amount 
c and  tossing  the  coin  again.  When  n tosses  have  already  been  made,  we  must 
stop. 

The  purpose  of  this  note  is  to  point  out  that  with  a simple  modification  the 
above  problem  fits  the  framework  in  which  a one-stage  look  ahead  policy  is  optimal. 
This  yields  not  only  an  easy  solution  to  the  problem  but  also  provides  much  insight. 
For  instance,  the  reason  for  the  additivity  of  the  optimal  continuation  boundary, 
which  is  commented  on  by  Starr  (1972)  on  Page  1890,  now  becomes  clear.  Also  the 
problem  may  be  generalized  so  that  the  terminal  payoff  is  a more  general  function 
of  the  terminal  run  of  heads,  which  may  even  also  depend  on  the  number  of  tosses 


made. 


2. THE  OPTIMAL  POLICY 

Consider  the  above  problem  with  the  exception  that  the  return  when  we  stop 
after  a terminal  run  of  r heads  is  f (r)  , where  f (r)  is  such  that 

f (r ) - pf  (r  + 1) 

is  nondecreasing  in  r . 

Define  V to  be  the  value  to  the  decision-maker  if  he  is  allowed  to  make  at 
n 

most  n tosses  before  stopping  and  when  he  employs  an  optimal  strategy,  and  note 
that  V ^ is  nondecreasing  in  n . Say  that  the  process  is  in  state  (r,j)  if  we 

are  on  a run  of  r heads  and  we  are  allowed  at  most  j more  coin  tosses. 

Now  let  us  consider  a modified  problem  which  is  such  that  when  we  are  in  any 
state  of  the  form  (0 , j ) , j >_  0 , we  are  forced  to  stop  and  we  receive  a terminal 
reward  V.  . (That  is,  whenever  a tail  occurs  we  must  stop  but  we  are  paid  as  if 
we  acted  optimally  from  this  point  on.)  In  this  modified  problem  if  we  stop  when 
ir.  state  (r,j)  we  receive  f (r)  , while  if  we  continue  for  exactly  one  more  toss 
and  then  stop  then  our  expected  return  is  pf  (r  + 1)  + (1  - p)V^_^  - c . Hence, 
the  one-stage  look  ahead  policy  (see  Derman  and  Sacks  (I960),  Chow  and  Robbins 
(1961)  or  [3],  pp.  137-138)  is  to  stop  at  state  (r,j)  either  if  r = 0 or  r ^ 0 
and 

f(r)  _>  pf  (r  + 1)  + (1  - P)Vj_1  " c • 

As  the  set  of  stopping  states  just  defined  is  closed  in  the  sense  that  once 
entered  is  never  left,  it  follows  that  the  one-stage  look  ahead  policy  is  optimal 
for  this  modified  problem  (in  the  terminology  of  [1]  we  are  in  the  monotone  case). 

As  an  optimal  policy  for  the  modified  problem  clearly  cannot  lead  to  non- 
optimal  actions  in  states  (r , j ) , r > 0 , for  the  original  problem,  it  remains 
only  to  determine  the  optimal  actions  at  states  of  the  form  (0,j)  . To  do  so  we 


3 


fix  j and  consider  a modified  problem,  allowing  at  most  j flips  and  such  that 
we  are  forced  to  stop  whenever  we  enter  a state  (0,i)  when  i < j and  we  receive 
i -'rminal  reward  V.  . The  one-stage  look  ahead  policy  for  this  problem  (which, 
as  before,  is  easily  shown  to  be  optimal)  calls  for  stopping  at  (0,j)  if 


f (0)  l pf(l)  + (1  - p)V  - c . 


Combining  this  with  our  previous  results  shows  that  for  the  original  problem 
it  is  optimal  to  stop  at  (r,j)  if  and  only  if 

f (r)  - pf  (r  + 1)  _>  (1  - P)Vj_2  ~ c • 

If  f (0)  = 0 , then  the  above  states  that  we  should  stop  if  and  only  if  the 
present  payoff  ( f ( r ) ) is  at  least  the  expected  payoff  if  we  make  exactly  one  more 
toss  (pf  (r  + 1)  - c)  plus  1 - p times  the  value  of  a new  game  which  allows  at 
most  i - 1 tosses  ((1  - p)7^  . 
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3. A GENERALIZATION 

The  problem  can  be  generalized  to  allow  the  terminal  reward  to  depend  not  only 
on  the  length  of  the  terminal  run  of  heads  but  also  on  the  number  of  tosses  taken. 
That  is,  assuming  that  we  can  initially  make  at  most  n tosses  then  the  return  if 
we  stop  when  in  state  (r,j)  would  be  some  function  f(r  , n - j)  , j <_  n . If 
the  function  f(r,i)  satisfies 

f(r,i)  <_  f(r  , i + 1) 

('-) 

f ( r + 1 , i + 1)  - pf  (r  + 2 , i + 2)  >_  f(r,i)  - pf(r  + 1,1  + 1) 

then  it  can  be  shown  by  the  same  method  as  used  in  Section  2 that  it  is  optimal  to 
stop  at  (r,j)  if  and  only  if 

f(r  , n-  j)j_pf(r  + l , n-j  +1)  + (1  - p)V^(j  ~ 1)  ~ c 

when  V (j)  is  the  conditional  expected  return  under  an  optimal  policy  from  time 
n-j  onward  given  that  the  head  run  is  of  length  zero  after  n-j  tosses.  An 
example  of  a terminal  reward  satisfying  (1)  is  f(r,i)  = r/i  , r i . In  words, 
the  terminal  reward  would  equal  the  terminal  head  run  divided  by  the  number  of 


tosses  made. 
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